3 research outputs found

    Deep Learning Structural and Historical Features for Anti-Patterns Detection

    Get PDF
    Les systèmes logiciels sont devenus une des composantes principales de tous les secteurs d’activité. Dans la course au rendement économique, les développeurs sont susceptibles d’implémenter des solutions non optimales aux problèmes qui leur sont posés. On nomme ainsi anti-patrons ou “design smells” ces mauvais choix de conception introduits par manque de temps et–ou d’expérience. Si ces derniers n’ont pas forcément d’impact à l’exécution, de nombreuses études ont mis en lumière leur influence négative sur la maintenabilité des systèmes. De nombreuses approches de détection automatique des anti-patrons ont été proposées. Pour la plupart, ces approches reposent sur l’analyse statique du code, mais il a été montré que les anti-pattrons sont aussi détectables par une analyse des données historiques des systèmes. Cependant, aucune d’entre elles ne semble clairement se distinguer des autres, et chaque approche identifie des ensembles d’occurrences différents, en particulier quand celles-ci reposent sur des sources d’information complémentaires (i.e., structurelles vs. historiques). Plusieurs approches basées sur l’apprentissage automatique ont tenté d’adresser ce problème. Toutefois, ces approches semblent faire face à des limitations qui leur sont intrinsèques. D’une part, inférer des caractéristiques de haut niveau sur les systèmes à partir de données brutes nécessite des modèles d’une grande complexité. D’autre part, l’entrainement de tels modèles requière un nombre conséquent d’exemples d’apprentissage, qui sont fastidieux à produire et existent en nombre très limité. Ce travail tire profit des méthodes d’apprentissage automatique pour répondre aux limitations évoquées précédemment. Dans un premier temps, nous proposons une méthode ensembliste permettant d’agréger plusieurs outils de détection. Nous montrons qu’une telle méthode atteint des performances nettement supérieures à celles des outils ainsi agrégés et permet de générer des instances d’apprentissage pour des modèles plus complexes à partir d’un nombre raisonnable d’exemples. Ensuite, nous proposons un modèle d’apprentissage profond pour la détection des anti-patrons. Ce modèle est basé sur l’analyse de l’évolution des métriques logicielles. Plus précisément, nous calculons les valeurs de certaines métriques pour chaque révision du système étudié, et, entrainons un réseau de neurones convolutif à y détecter les anti-patrons à partir de ces données. Nous montrons qu’en s’appuyant ainsi sur les aspects structurels et historiques des systèmes, notre modèle surpasse les approches existantes. Nos approches ont été expérimentées dans le cadre de la détection de deux anti-patrons populaires : God Class et Feature Envy, et leurs performances comparées avec celles de l’état de l’art.----------ABSTRACT: Software systems are constantly modified, whether to be adapted or to be fixed. Due to the exigence of economic performances, these modifications are sometimes performed in a hurry and developers often implement sub optimal solutions that decrease the quality of the code. In this context, the term “anti-pattern” have been introduced to represent such “bad” solutions to recurring design problems. A variety of approaches have been proposed to identify the occurrences of anti-patterns in source code. Most of them rely on structural aspects of software systems but some alternative solutions exist. It has been shown that anti-patterns are also detectable through an analysis of historical information, i.e., by analyzing how code components evolve with one another over time. However, none of these approaches can claim high performances for any anti-pattern and for any system. Furthermore different approaches identify different sets of occurrences, especially when based on orthogonal sources of information (structural vs. historical). Several machine-learning based approaches have been proposed to address this issue. However these approaches failed to surpass conventional detection techniques. On the one hand, learning high level features from raw data requires complex models such as deep neuralnetworks. On the other hand, training such complex models requires substantial amounts of manually-produced training data, which is hardly available and time consuming to produce for anti-patterns. In this work, we address these issues by taking advantage of machine-learning techniques. First we propose a machine-learning based ensemble method to efficiently aggregate various anti-patterns detection tools. We show that (1) such approach clearly enhances the performances of the so aggregated tools and; (2) our method produces reliable training instances for more complex anti-pattern detection models from a reasonable number of training examples. Second we propose a deep-learning based approach to detect anti-patterns by analyzing how source code metrics evolve over time. To do so, we retrieve code metrics values for each revision of the system under investigation by mining its version control system. This information is then provided as input to a convolutional neural network to perform final prediction. The results of our experiments indicate that our model significantly outperforms existing approaches. We experiment our approaches for the detection of two widely known anti-patterns: God Class and Feature Envy and compare their performances with those of state-of-the-art

    A Machine-learning Based Ensemble Method For Anti-patterns Detection

    Full text link
    Anti-patterns are poor solutions to recurring design problems. Several empirical studies have highlighted their negative impact on program comprehension, maintainability, as well as fault-proneness. A variety of detection approaches have been proposed to identify their occurrences in source code. However, these approaches can identify only a subset of the occurrences and report large numbers of false positives and misses. Furthermore, a low agreement is generally observed among different approaches. Recent studies have shown the potential of machine-learning models to improve this situation. However, such algorithms require large sets of manually-produced training-data, which often limits their application in practice. In this paper, we present SMAD (SMart Aggregation of Anti-patterns Detectors), a machine-learning based ensemble method to aggregate various anti-patterns detection approaches on the basis of their internal detection rules. Thus, our method uses several detection tools to produce an improved prediction from a reasonable number of training examples. We implemented SMAD for the detection of two well known anti-patterns: God Class and Feature Envy. With the results of our experiments conducted on eight java projects, we show that: (1) our method clearly improves the so aggregated tools; (2) SMAD significantly outperforms other ensemble methods.Comment: Preprint Submitted to Journal of Systems and Software, Elsevie

    Deep Learning Anti-patterns from Code Metrics History

    Full text link
    Anti-patterns are poor solutions to recurring design problems. Number of empirical studies have highlighted the negative impact of anti-patterns on software maintenance which motivated the development of various detection techniques. Most of these approaches rely on structural metrics of software systems to identify affected components while others exploit historical information by analyzing co-changes occurring between code components. By relying solely on one aspect of software systems (i.e., structural or historical), existing approaches miss some precious information which limits their performances. In this paper, we propose CAME (Convolutional Analysis of code Metrics Evolution), a deep-learning based approach that relies on both structural and historical information to detect anti-patterns. Our approach exploits historical values of structural code metrics mined from version control systems and uses a Convolutional Neural Network classifier to infer the presence of anti-patterns from this information. We experiment our approach for the widely known God Class anti-pattern and evaluate its performances on three software systems. With the results of our study, we show that: (1) using historical values of source code metrics allows to increase the precision; (2) CAME outperforms existing static machine-learning classifiers; and (3) CAME outperforms existing detection tools.Comment: Preprint. Paper accepted for inclusion in the Research Track of the 35th IEEE International Conference on Software Maintenance and Evolution (ICSME 2019), Cleveland, Ohio, US
    corecore